Nvidia dominates in gen AI benchmarks, clobbering 2 rival AI chips


MLCommons

Nvidia’s general-purpose GPU chips have once again made a nearly clean sweep of one of the most popular benchmarks for measuring chip performance in artificial intelligence, this time with a new focus on generative AI applications such as large language models (LLMs).

There wasn’t much competition.

Systems put together by SuperMicro, Hewlett Packard Enterprise, Lenovo, and others — packed with as many as eight Nvidia chips — on Wednesday took most of the top honors in the MLPerf benchmark test organized by the MLCommons, an industry consortium.

Also: With AI models clobbering every benchmark, it’s time for human evaluation

The test, measuring how fast machines can produce tokens, process queries, or output samples of data — known as AI inference — is the fifth installment of the prediction-making benchmark that has been going on for years.

This time, the MLCommons updated the speed tests with two tests representing common generative AI uses. One test is how fast the chips perform on Meta’s open-source LLM Llama 3.1 405b, which is one of the larger gen AI programs in common use. 

The MLCommons also added an interactive version of Meta’s smaller Llama 2 70b. That test is meant to simulate what happens with a chatbot, where response time is a factor. The machines are tested for how fast they generate the first token of output from the language model, to simulate the need for a quick response when someone has typed a prompt.

A third new test measures the speed of processing graph neural networks, which are problems composed of a bunch of entities and their relations, such as in a social network. 

Graph neural nets have grown in importance as a component of programs that use gen AI. For example, Google’s DeepMind unit used graph nets extensively to make stunning breakthroughs in protein-folding predictions with its AlphaFold 2 model in 2021.

A fourth new test measures how fast LiDAR sensing data can be assembled in an automobile map of the road. The MLCommons built its own version of a neural net for the test, combining existing open-source approaches.

mlperf-inference-v5-0-press-briefing-final-deck-under-embargo-until-4-2-25-8-00am-pt-slide-16

MLCommons

The MLPerf competition comprises computers assembled by Lenovo, HPE, and others according to strict requirements for the accuracy of neural net output. Each computer system submitted reports to the MLCommons of its best speed in producing output per second. In some tasks, the benchmark is the average latency, how long it takes for the response to come back from the server.

Nvidia’s GPUs produced top results in almost every test in the closed division, where the rules for the software setup are the most strict. 

mlperf-inference-v5-0-press-briefing-final-deck-under-embargo-until-4-2-25-8-00am-pt-slide-12

MLCommons

Competitor AMD, running its MI300X GPU, took the top score in two of the tests of Llama 2 70b. It produced 103,182 tokens per second, significantly better than the second-best result from Nvidia’s newer Blackwell GPU.

That winning AMD system was put together by a new entrant to the MLPerf benchmark, the startup MangoBoost, which makes plug-in cards that can speed data transfer between GPU racks. The company also develops software to improve serving of gen AI, called LLMboost

Nvidia disputes the comparison of the AMD score to its Blackwell score, citing the need to “normalize” scores across the number of chips and computer “nodes” used in each  

Said Nvidia’s director of accelerated computing products, Dave Salvator, in an email to ZDNET:

“MangoBoost’s results do not reflect an accurate performance comparison against NVIDIA’s results. AMD’s testing applied 4X the number of GPUs – 32 MI300X GPUs – against 8 NVIDIA B200s, yet still only achieved a 3.83% higher result than the NVIDIA submission. NVIDIA’s 8x B200 submission actually outperformed MangoBoost’s x32 AMD MI300X GPUs in the Llama 2 70B server submission.”

Also: ChatGPT’s new image generator shattered my expectations – and now it’s free to try

Google also submitted a system, showing off its Trillium chip, the sixth iteration of its in-house Tensor Processing Unit (TPU). That system trailed far behind Nvidia’s Blackwell in a test of how fast the computer could answer queries for the Stable Diffusion image-generation test. 

The latest round of MLPerf benchmarks featured fewer competitors to Nvidia than in some past installments. For example, microprocessor giant Intel’s Habana unit did not have any submissions with its chips, as it has in years past. Mobile chip giant Qualcomm did not have any submissions this time around either. 

The benchmarks offered some nice bragging rights for Intel, however. Every computer system needs not only the GPU to accelerate the AI math, but also a host processor to run the ordinary work of scheduling tasks and managing memory and storage. 

Also: Intel’s new CEO vows to run chipmaker ‘as a startup, on day one’

In the datacenter closed division, Intel’s Xeon microprocessor was the host processor that powered seven of the top 11 systems, versus only three wins for AMD’s EPYC server microprocessor. That represents an improved showing for Intel versus years prior.

The 11th top-performing system, the benchmark of speed to process Meta’s giant Llama 3.1 405b, was built by Nvidia itself without an Intel or AMD microprocessor onboard. Instead, Nvidia used the combined Grace-Blackwell 200 chip, where the Blackwell GPU is connected in the same package with Nvidia’s own Grace microprocessor. 

Want more stories about AI? Sign up for Innovation, our weekly newsletter.





Source link

Leave a Comment